A Tufte Handout On Hypothesis Testing and
Confidence Intervals for the Mean of Quantities
Hypothesis Testing and Confidence Intervals
Robert W. Walker
2023-02-24
cars data
I will work with R’s internal dataset on cars: cars.
There are two variables in the dataset, speed [speed] and distance
[dist] this is what they look like.
Null equal and alternative two-sided [not equal]: \(\mu = target\) vs. \(\mu \neq target\)
Null less than or equal and alternative greater: \(\mu \leq target\) vs. \(\mu > target\)
Null greater than or equal and alternative lesser: \(\mu \geq target\) vs. \(\mu < target\)
A confidence level that is a threshold in
probability for rendering a true/false answer.
Hypothesis Testing and \(t\)
I will work with the speed variable. On a basic level, a hypothesis
is a specific value for the average here. For whatever reason, \(\mu=17\) motivates us. I consider all three
distinct hypotheses though this is not entirely proper in the sense that
any given application of hypothesis testing should have a clear
hypothesis about the variable of interest rather than waffling amongst
all of them as I will. I do so only for illustrative purposes. To begin,
I need to specify both the hypothesis and the confidence level that I
intend to use to evaluate it.
Case 1: The hypothesis to advance is that 17 is the
true average speed. The alternative must then be that the average speed
is not 17; it could be smaller or larger than 17. I wish to evaluate
this claim with 90% confidence.
Case 2: The hypothesis to advance is that 17 or
greater is the true average speed. The alternative must then be that the
average speed is less than 17. I wish to evaluate this claim with 90%
confidence.
Case 3: The hypothesis to advance is that 17 or
less is the true average speed. The alternative must then be that the
average speed is greater than 17. I wish to evaluate this claim with 90%
confidence.
Before getting to equations, something is important to note.
The mean of the data, \(\overline{x}\),
\(s\), the standard deviation, and
\(n\) are all known from the data.
There are then only two remaining unknowns, either \(t\) or \(\mu\).
One further algebraic manipulation before starting. Let’s solve for
\(\mu\).
This fully defines the parts we require and the core problem because
the data essentially gives us almost all of the unknowns. In some very
basic way, conditional on data, only \(\mu\) or \(t\) from a given probability remain
unknown.
There are 2.14 standard errors between the mean that we obtain
from the data and the hypothetical mean of 17; the value from the data
is smaller, hence the negative.
The middle 90% of the simulated
means range from 14.2 to 16.6.
Critical Values
P(-1.677 < X < 1.677) = 0.9
1 - P(-1.677 < X < 1.677) = 0.1
Result: The p-value
P(X < -2.14) = 0.019
P(X > 2.14) = 0.019
P(-2.14 < X < 2.14) = 0.963
1 - P(-2.14 < X < 2.14) = 0.037
t.test
Alternative: Two-sided
t.test(cars$speed, mu=17, conf.level = 0.9)
##
## One Sample t-test
##
## data: cars$speed
## t = -2.1397, df = 49, p-value = 0.03739
## alternative hypothesis: true mean is not equal to 17
## 90 percent confidence interval:
## 14.1463 16.6537
## sample estimates:
## mean of x
## 15.4
Knowing only the sample size, 50, is sufficient to determine what
\(t\) must be to reject a mean of 17 in
favor of the alternative that the true mean is not 17. With 0.9
probability, this implies two boundaries; either the true mean is
smaller than 17, with 0.05 probability spanning 0 to 0.05 and it is
bigger than 17 with 0.05 probability spanning 0.95 to 1 so that the
interior range represents 0.9 probability as required.
Critical Values: The sample mean would have to
be at least \(\pm\) 1.677
standard errors away from 17 to rule out a mean of 17 with 90%
confidence.
Result: Our sample mean is 2.14
standard errors away from 17. The probability of something equally or
more extreme is 0.037. This is
known as the p-value..
##
## One Sample t-test
##
## data: cars$speed
## t = -2.1397, df = 49, p-value = 0.01869
## alternative hypothesis: true mean is less than 17
## 90 percent confidence interval:
## -Inf 16.37143
## sample estimates:
## mean of x
## 15.4
radiant
Alternative: Less
result <-
single_mean( cars, var = “speed”, comp_value = 17, alternative = “less”,
conf_lev = 0.9 ) summary(result) # plot(result, plots = “hist”, custom =
FALSE)
Analytics
Knowing only the sample size, 50, is sufficient to determine what
\(t\) must be to reject a mean of 17 or
greater in favor of the alternative that the true mean is less than 17.
With 0.9 probability, either the true mean is smaller than 17, with 0.1
probability or it is 17 or bigger with 0.9 probability as required.
Critical Values: The sample mean would have to
be at most -1.299 standard errors away from 17 to rule
out a mean of 17 or greater with 90% confidence.
Result: Our sample mean is
-2.14 standard errors away from 17. The probability of
something equal or lesser is 0.019.
This is known as the p-value.
##
## One Sample t-test
##
## data: cars$speed
## t = -2.1397, df = 49, p-value = 0.9813
## alternative hypothesis: true mean is greater than 17
## 90 percent confidence interval:
## 14.42857 Inf
## sample estimates:
## mean of x
## 15.4
radiant
Alternative: Greater
result <- single_mean(
cars,
var = "speed",
comp_value = 17,
alternative = "greater",
conf_lev = 0.9
)
summary(result)
## Single mean test
## Data : cars
## Variable : speed
## Confidence: 0.9
## Null hyp. : the mean of speed = 17
## Alt. hyp. : the mean of speed is > 17
##
## mean n n_missing sd se me
## 15.400 50 0 5.288 0.748 1.254
##
## diff se t.value p.value df 10% 100%
## -1.6 0.748 -2.14 0.981 49 14.429 Inf
##
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# plot(result, plots = "hist", custom = FALSE)
Analytics
Knowing only the sample size, 50, is sufficient to determine what
\(t\) must be to reject a mean of 17 or
less in favor of the alternative that the true mean is greater than 17.
With 0.9 probability, the true mean is bigger than 17 and with 0.1
probability, it is smaller as required.
Critical Values: The sample mean would have to
be at most 1.299 standard errors away from 17 to rule
out a mean of 17 or less with 90% confidence.
Result: Our sample mean is
-2.14 standard errors away from 17. The probability of
something equal or greater is 0.981.
This is known as the p-value.
Interpreting the result, the sample mean is 2.14 standard errors
below the hypothetical mean of 17. Any \(t\) less than -1.299 would have been
sufficient to reject the claim that \(\mu \geq
17\) and conclude that it must be less than 17 with 90%
confidence.
I could also use the second manipulation above to study this in the
original metric. We would accept [or fail to reject] the claim that
\(\mu \geq 17\) if \(t \geq -1.299\). Let’s plug this in.
Put in words, we could sustain the belief that \(\mu \geq 17\) with 90% confidence as long
as we see a sample mean of 16.03 or greater given this standard
deviation and sample size. Ours is 15.4 so we must reject the claim that
\(\mu \geq 17\) with 90%
confidence.
plot(17 + 0.7478*seq(-5,5, by=0.01), dt(seq(-5,5, by=0.01), df=49), xlab=expression(paste(mu)), main="measured in std. errors of the mean", ylab="Density", type="l")
A p-value
The probability of a sample mean of 15.4 [or smaller] given a true
average of 17, this standard deviation and sample size is
pt(-2.14, 49) = 0.0186798. This is referred to as a
p-value. Notice that this probability is less than 0.1;
thus with at least 90% confidence, the true mean is not 17 or greater
and thus must be smaller because this probability is less than 10% or
0.1. A p-value less than 100 minus the confidence level [expressed as a
percentage] or 1 minus the confidence level [expressed as a probability]
is sufficient to reject the hypothesis.
Giving an interpretation of this, assuming the hypothetical mean [17
or greater] is true, the likelihood of generating a sample mean of 15.4
is only 0.0187 and this is far less than the 10% permissible outside of
90% confidence. Indeed, any sample mean more than 1.299 standard errors
below 17 would be too small to sustain the belief that the true mean is
17 or greater because qt(0.1, 49) is -1.299. Put in the
original metric, any sample mean below 16.0285747 would require a
rejection of the claim that the true mean is 17 or greater with 90%
confidence.
The Confidence Interval
The confidence interval is always centered on the sample mean.
Rearranging the equation above and solving for \(\mu\) given the \(t\) above, we get
With 90% confidence, given this sample mean, the true value should be
less than 16.37143.
The native t.test
t.test(cars$speed, conf.level = 0.9, alternative = "less", mu=17)
##
## One Sample t-test
##
## data: cars$speed
## t = -2.1397, df = 49, p-value = 0.01869
## alternative hypothesis: true mean is less than 17
## 90 percent confidence interval:
## -Inf 16.37143
## sample estimates:
## mean of x
## 15.4
Simplifying?
\[ t(\frac{s}{\sqrt{n}}) = \overline{x} -
\mu \] can lead to either:
\[ \overline{x} - t(\frac{s}{\sqrt{n}}) =
\mu \]
or
\[ \overline{x} = \mu +
t(\frac{s}{\sqrt{n}}) \]
So a minus \(t\) will be below \(\mu\) but above \(\overline{x}\) and a positive \(t\) will be above \(\mu\) but below \(\overline{x}\).
1. An hypothesis test given \(\mu\)
with an alternative that is less must then render an upper bound given
\(\overline{x}\).
2. An hypothesis test given \(\mu\)
with an alternative that is greater must then render a lower bound given
\(\overline{x}\).
A graphical representation
Given a sample size \(n\), some
unknown constant \(\mu\) and
satisfaction of Lindeberg’s condition, the sampling distribution of the
sample mean follows a \(t\)
distribution with degrees of freedom \(n-1\). To render a graphical
representation, let’s arbitrarily set n to 50, as in the above example.
Here is a plot.
plot(seq(-5,5, by=0.01), dt(seq(-5,5, by=0.01), df=49), xlab=expression(paste("x-bar -",mu," (measured in std. errors of the mean)", sep="")), ylab="Density", type="l")
Inverting the scale transformation
We can now reverse the scale by the standard error of the mean. In
the above example, it is 0.7478. Measured in miles per hour, we
obtain:
The probability of seeing such a small sample mean if the true
average is 17 is only 0.01869. The probability above the dotted black
line is 0.9 with 0.1 below. WIth 90% confidence, anything below this
would be sufficient evidence to reject the claim that the true average
is 17 or above.
The Confidence Interval
Let’s take the sample mean as the center and work out a confidence
interval at 90%. It’s exactly the 16.37143 gives above.
As an aside, 17 has exactly 0.01869 probability above it shown in
orange.
plot(x=15.4+seq(-5,5, by=0.01)*0.7478, dt(seq(-5,5, by=0.01), df=49), xlab=expression(paste(mu," | x-bar (measured in mph)", sep="")), ylab="Density", type="l")
abline(v=15.4, col="blue")
abline(v=15.4 - qt(0.1, df=49)*0.7478, col="black", lty=3)
polygon(x = c(11, 15.4+seq(-5,1.3, by=0.01)*0.7478), y = c(dt(seq(-5,1.3, by=0.01), df=49), 0), col = "blue")
polygon(x = c(15.4+seq(2.14,5, by=0.01)*0.7478, 17), y = c(dt(seq(2.14,5, by=0.01), df=49), 0), col = "orange")
Headings
This style provides first and second-level headings (that is,
# and ##), demonstrated in the next section.
You may get unexpected output if you try to use ### and
smaller headings.
In his later books1Beautiful
Evidence, Tufte starts each
section with a bit of vertical space, a non-indented paragraph, and sets
the first few words of the sentence in small caps. To accomplish this
using this style, call the newthought() function in
tufte in an inline R expression`r ` as demonstrated at the beginning of this paragraph.2 Note you should not assume tufte has
been attached to your R session. You should either
library(tufte) in your R Markdown document before you call
newthought(), or use tufte::newthought().
Figures
Margin Figures
Images and graphics play an integral role in Tufte’s work. To place
figures in the margin you can use the knitr chunk
option fig.margin = TRUE. For example:
Note the use of the fig.cap chunk option to provide a
figure caption. You can adjust the proportions of figures using the
fig.width and fig.height chunk options. These
are specified in inches, and will be automatically scaled down to fit
within the handout margin.
Arbitrary Margin Content
In fact, you can include anything in the margin using the
knitr engine named marginfigure. Unlike R
code chunks ```{r}, you write a chunk starting with
```{marginfigure} instead, then put the content in the
chunk. See an example on the right about the first fundamental theorem
of calculus.
We know from the
first fundamental theorem of calculus that for \(x\) in
\([a,
b]\): \[\frac{d}{dx}\left( \int_{a}^{x}
f(u)\,du\right)=f(x).\]
For the sake of portability between LaTeX and HTML, you should keep
the margin content as simple as possible (syntax-wise) in the
marginefigure blocks. You may use simple Markdown syntax
like **bold** and _italic_ text, but please
refrain from using footnotes, citations, or block-level elements
(e.g. blockquotes and lists) there.
Note: if you set echo = FALSE in your global chunk
options, you will have to add echo = TRUE to the chunk to
display a margin figure, for example
```{marginfigure, echo = TRUE}.
Full Width Figures
You can arrange for figures to span across the entire page by using
the chunk option fig.fullwidth = TRUE.
Other chunk options related to figures can still be used, such as
fig.width, fig.cap, out.width,
and so on. For full width figures, usually fig.width is
large and fig.height is small. In the above example, the
plot size is \(10 \times 2\).
Arbitrary Full Width Content
Any content can span to the full width of the page. This feature
requires Pandoc 2.0 or above. All you need is to put your content in a
fenced Div with the class fullwidth, e.g.,
::: {.fullwidth}
Any _full width_ content here.
:::
Below is an example:
R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under the terms of the GNU General
Public License versions 2 or 3. For more information about these matters
see https://www.gnu.org/licenses/.
Main Column Figures
Besides margin and full width figures, you can of course also include
figures constrained to the main column. This is the default type of
figures in the LaTeX/HTML output.
One of the most prominent and distinctive features of this style is
the extensive use of sidenotes. There is a wide margin to provide ample
room for sidenotes and small figures. Any use of a footnote will
automatically be converted to a sidenote. 3 This is a sidenote that was entered using a footnote.
If you’d like to place ancillary information in the margin without
the sidenote mark (the superscript number), you can use the
margin_note() function from tufte in an
inline R expression.
This is a margin note. Notice that there is no number
preceding the note. This function does not process the text with
Pandoc, so Markdown syntax will not work here. If you need to write
anything in Markdown syntax, please use the marginfigure
block described previously.
References
References can be displayed as margin notes for HTML output. For
example, we can cite R here (R Core Team 2022R Core Team. 2022. R: A Language and Environment for Statistical). To
enable this feature, you must set link-citations: yes in
the YAML metadata, and the version of pandoc-citeproc
should be at least 0.7.2. You can always install your own version of
Pandoc from https://pandoc.org/installing.html if the version is not
sufficient. To check the version of pandoc-citeproc in your
system, you may run this in R:
system2('pandoc-citeproc', '--version')
If your version of pandoc-citeproc is too low, or you
did not set link-citations: yes in YAML, references in the
HTML output will be placed at the end of the output document.
Tables
You can use the kable() function from the
knitr package to format tables that integrate well with
the rest of the Tufte handout style. The table captions are placed in
the margin like figures in the HTML output.
knitr::kable(
mtcars[1:6, 1:6], caption = 'A subset of mtcars.'
)
A subset of mtcars.
mpg
cyl
disp
hp
drat
wt
Mazda RX4
21.0
6
160
110
3.90
2.620
Mazda RX4 Wag
21.0
6
160
110
3.90
2.875
Datsun 710
22.8
4
108
93
3.85
2.320
Hornet 4 Drive
21.4
6
258
110
3.08
3.215
Hornet Sportabout
18.7
8
360
175
3.15
3.440
Valiant
18.1
6
225
105
2.76
3.460
Block Quotes
We know from the Markdown syntax that paragraphs that start with
> are converted to block quotes. If you want to add a
right-aligned footer for the quote, you may use the function
quote_footer() from tufte in an inline R
expression. Here is an example:
“If it weren’t for my lawyer, I’d still be in prison. It went a lot
faster with two people digging.”
Without using quote_footer(), it looks like this (the
second line is just a normal paragraph):
“Great people talk about ideas, average people talk about things, and
small people talk about wine.”
— Fran Lebowitz
Responsiveness
The HTML page is responsive in the sense that when the page width is
smaller than 760px, sidenotes and margin notes will be hidden by
default. For sidenotes, you can click their numbers (the superscripts)
to toggle their visibility. For margin notes, you may click the circled
plus signs to toggle visibility.
More Examples
The rest of this document consists of a few test cases to make sure
everything still works well in slightly more complicated scenarios.
First we generate two plots in one figure environment with the chunk
option fig.show = 'hold':
p <- ggplot(mtcars2, aes(hp, mpg, color = am)) +
geom_point()
p
p + geom_smooth()
Two plots in one figure environment.
Then two plots in separate figure environments (the code is identical
to the previous code chunk, but the chunk option is the default
fig.show = 'asis' now):
p <- ggplot(mtcars2, aes(hp, mpg, color = am)) +
geom_point()
p
Two plots in separate figure environments (the first plot).
p + geom_smooth()
Two plots in separate figure environments (the second plot).
You may have noticed that the two figures have different captions,
and that is because we used a character vector of length 2 for the chunk
option fig.cap (something like
fig.cap = c('first plot', 'second plot')).
Next we show multiple plots in margin figures. Similarly, two plots
in the same figure environment in the margin:
Two plots in one figure environment in
the margin.
p
p + geom_smooth(method = 'lm')
Then two plots from the same code chunk placed in different figure
environments:
knitr::kable(head(iris, 15))
Sepal.Length
Sepal.Width
Petal.Length
Petal.Width
Species
5.1
3.5
1.4
0.2
setosa
4.9
3.0
1.4
0.2
setosa
4.7
3.2
1.3
0.2
setosa
4.6
3.1
1.5
0.2
setosa
5.0
3.6
1.4
0.2
setosa
5.4
3.9
1.7
0.4
setosa
4.6
3.4
1.4
0.3
setosa
5.0
3.4
1.5
0.2
setosa
4.4
2.9
1.4
0.2
setosa
4.9
3.1
1.5
0.1
setosa
5.4
3.7
1.5
0.2
setosa
4.8
3.4
1.6
0.2
setosa
4.8
3.0
1.4
0.1
setosa
4.3
3.0
1.1
0.1
setosa
5.8
4.0
1.2
0.2
setosa
Two plots in separate figure
environments in the margin (the first plot).
p
knitr::kable(head(iris, 12))
Sepal.Length
Sepal.Width
Petal.Length
Petal.Width
Species
5.1
3.5
1.4
0.2
setosa
4.9
3.0
1.4
0.2
setosa
4.7
3.2
1.3
0.2
setosa
4.6
3.1
1.5
0.2
setosa
5.0
3.6
1.4
0.2
setosa
5.4
3.9
1.7
0.4
setosa
4.6
3.4
1.4
0.3
setosa
5.0
3.4
1.5
0.2
setosa
4.4
2.9
1.4
0.2
setosa
4.9
3.1
1.5
0.1
setosa
5.4
3.7
1.5
0.2
setosa
4.8
3.4
1.6
0.2
setosa
Two plots in separate figure
environments in the margin (the second plot).
p + geom_smooth(method = 'lm')
knitr::kable(head(iris, 5))
Sepal.Length
Sepal.Width
Petal.Length
Petal.Width
Species
5.1
3.5
1.4
0.2
setosa
4.9
3.0
1.4
0.2
setosa
4.7
3.2
1.3
0.2
setosa
4.6
3.1
1.5
0.2
setosa
5.0
3.6
1.4
0.2
setosa
We blended some tables in the above code chunk only as
placeholders to make sure there is enough vertical space among
the margin figures, otherwise they will be stacked tightly together. For
a practical document, you should not insert too many margin figures
consecutively and make the margin crowded.
You do not have to assign captions to figures. We show three figures
with no captions below in the margin, in the main column, and in full
width, respectively.
# a boxplot of weight vs transmission; this figure
# will be placed in the margin
ggplot(mtcars2, aes(am, wt)) + geom_boxplot() +
coord_flip()
# a figure in the main column
p <- ggplot(mtcars, aes(wt, hp)) + geom_point()
p
# a fullwidth figure
p + geom_smooth(method = 'lm') + facet_grid(~ gear)
Some Notes on Tufte CSS
There are a few other things in Tufte CSS that we have not mentioned
so far. If you prefer sans-serif fonts, use
the function sans_serif() in tufte. For
epigraphs, you may use a pair of underscores to make the paragraph
italic in a block quote, e.g.
I can win an argument on any topic, against any opponent. People
know this, and steer clear of me at parties. Often, as a sign of their
great respect, they don’t even invite me.
We hope you will enjoy the simplicity of R Markdown and this R
package, and we sincerely thank the authors of the Tufte-CSS and
Tufte-LaTeX projects for developing the beautiful CSS and LaTeX classes.
Our tufte package would not have been possible without
their heavy lifting.
You can turn on/off some features of the Tufte style in HTML output.
The default features enabled are:
If you do not want the page background to be lightyellow, you can
remove background from tufte_features. You can
also customize the style of the HTML page via a CSS file. For example,
if you do not want the subtitle to be italic, you can define
h3.subtitle em {
font-style: normal;
}
in, say, a CSS file my_style.css (under the same
directory of your Rmd document), and apply it to your HTML output via
the css option, e.g.,
There is also a variant of the Tufte style in HTML/CSS named “Envisoned CSS”.
This style can be used by specifying the argument
tufte_variant = 'envisioned' in tufte_html()4 The actual Envisioned CSS was not used in the
tufte package. We only changed the fonts, background
color, and text color based on the default Tufte style., e.g.
To see the R Markdown source of this example document, you may follow
this
link to Github, use the wizard in RStudio IDE
(File -> New File -> R Markdown -> From Template),
or open the Rmd file in the package: